法国专利FR3021428A1 MULTIPLICATION OF BIT MATRICES USING EXPLICIT REGISTERS

专利PDF首页>>法国专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
The invention relates to a processor comprising, in its instruction set, a bit matrix multiplication instruction (sbmm) having a first double precision operand (A) representing a first matrix to be multiplied, a second operand (B ) explicitly designating any two simple precision work registers whose joined contents represent a second matrix to be multiplied, and a destination parameter (C) explicitly designating any two simple precision work registers to jointly hold a matrix representing the result of the multiplication.
公开号:FR3021428A1
申请号:FR1454683
申请日:2014-05-23
公开日:2015-11-27
发明作者:De Dinechin Benoit Dupont；Marta Rybczynska
申请人:Kalray SA；
IPC主号:

专利说明:

[0001] TECHNICAL FIELD The invention relates to a processor having data reorganization functionalities, in particular using a bit matrix multiplication unit. Background A bit matrix multiplication unit, often referred to as BMM (Bit-Matrix Multiply), allows for data reorganization in a single instruction cycle. Many types of reorganization are possible, ranging from reordering the individual bits of the processed data. The article [Yedidya Hilewitz et al. "Bit Matrix Multiplication in Commodity Processors", IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2008] describes applications of BMM units. In practice, a BMM operator is used with one of the operands selected at a constant value to define a particular operation on the content of the other operand. Constants chosen for the first operand make it possible to switch rows of the matrix associated with the second operand, that is to say, to switch the words represented by the rows. Constants chosen for the second operand make it possible to switch columns of the matrix associated with the first operand, that is to say, to switch the bits of all the rows of the matrix in the same pattern. However, a BMM unit finds limits in efficiency when reorganizations mix data from multiple matrices. SUMMARY A processor is generally provided comprising, in its instruction set, a bit matrix multiplication instruction having a first double precision operand representing a first matrix to be multiplied, a second operand explicitly designating two work registers. any simple precision whose attached contents represent a second matrix to be multiplied, and a destination parameter explicitly designating any two simple precision work registers to jointly hold a matrix representing the result of the multiplication.
[0002] The processor may comprise a set of simple precision work registers adapted for reading, joining the contents of two individually selected registers to a double precision outgoing word and, in writing, separating a double precision incoming word into two registers individually selected; a bit matrix multiplication unit adapted to receive two multiplicative matrices in the form of double precision words and write a result matrix in the form of a double precision word in the set of registers; an instruction processing unit adapted for executing a bit matrix multiplication instruction: - supplying the first operand directly as a first of the two multiplicands of the bit matrix multiplication unit, - using the second operand for reading in the set of registers the second multiplicand of the bit matrix multiplication unit, and - using the destination parameter for writing in the set of registers the result provided by the matrix multiplication unit of bits.
[0003] The bit matrix multiplication unit may further be adapted to respond to a variant of the bit matrix multiplication instruction by providing a double precision result corresponding to the transposed matrix of the result of the multiplication. There is also provided a method of multiplying bit matrices, comprising the steps of representing bit matrices by double precision words; read two individually selected registers in a set of simple precision work registers; join the contents of the two read registers to form a first multiplicand matrix; multiply the first multiplicand matrix by a second multiplicand matrix; separate the result of the multiplication into two words of simple precision; and writing the two simple precision words into two individually selected registers of the register set. The method may further include the steps of defining the second multiplicand matrix directly in a first operand of a bit matrix multiplication instruction; defining the registers for forming the first multiplicand matrix in a second operand of the bit matrix multiplication instruction; and defining the registers for accommodating the result of the multiplication in a destination parameter of the bit matrix multiplication instruction.
[0004] BRIEF DESCRIPTION OF THE DRAWINGS Embodiments will be set forth in the following description, given in a nonlimiting manner in relation to the appended figures among which: FIG. 1 is a block diagram of a BMM unit designed to process 8x8 matrices bits; FIG. 2 illustrates the operation of a BMM unit; FIG. 3 is a block diagram of a BMM unit associated with a particular structure register bank of a processor; FIG. 4 illustrates a word interleaving operation coming from two packets; FIGS. 5A to 5C illustrate data at various stages of the execution of a particular BMM instruction used to process the reorganization of FIG. 4; FIG. 6 illustrates a bit interleaving operation of two packets; and FIGS. 7A to 7D illustrate data at various stages of the execution of two particular BMM instructions used to process the reorganization of FIG. 6. DESCRIPTION OF EMBODIMENTS FIG. 1 is a block diagram of a BMM unit . The size of the manipulated arrays is generally square and conforms to one of the precision formats handled by the processor. Hereinafter, for example, a 32-bit processor handling a "simple" precision of 32 bits and a "double" precision of 64 bits is considered. In this context, the matrices are 8 × 8-bit matrices each of which can be represented by a double precision word (64 bits). The unit BMM thus receives two words of 64 bits A and B representing two multiplicating matrices of 8x8 bits MATa and MATb. The matrices MATa and MATb are multiplied by a wired circuit MMULT to produce a result matrix MATc of 8x8 bits. This matrix MATc is provided by the BMM unit in the form of a 64-bit C word.
[0005] It is considered hereinafter that the bytes forming a 64-bit word representing a matrix are arranged by increasing weight from the first row to the eighth row of the matrix, and that the bits of each byte are arranged by decreasing weights of the first column. in the eighth column of the matrix. Thus, if the weight byte j of the weight byte i of a 64-bit word is written by bu, the corresponding matrix is expressed by: b03 bot b13 b12 b23 b22 b33 b32 b43 b42 b53 b52 b63 b62 b73 b72 bowl boo b11 b10 b21 b21 b21 b31 b30 b41 b40 b51 b50 b61 b60 b71 b70 bowl bo6 bottom bo4 b17 b16 b15 b14 b27 b26 b25 b24 b37 b36 b35 b34 b47 b46 b45 b44 b57 b56 b55 b54 b67 b66 b65 b64 b77 b76 b75 b74 As previously stated, one of the operands receives a constant that defines the reorganization to operate on the other operand, variable. If the unit BMM performs the multiplication AxB, by choosing a constant for the operand A, one defines a reorganization of the rows of the matrix associated with the operand B, that is to say bytes forming the operand B By choosing a constant for operand B, we define a reorganization of the columns of the matrix associated with operand A. The cases considered hereinafter by way of example can be processed using constants for operand A A particular constant is that associated with the identity matrix, comprising only 1 on the first diagonal. For an 8x8 matrix representing 64-bit B operands, the identity matrix is expressed in hexadecimal by: MID = 0x80 40 20 10 08 04 02 01 Each pair of digits of a constant represents a byte or row of the matrix, l byte of least weight (0x01) corresponding to the first row of the matrix. From this MID constant, one can construct a MOP hexadecimal constant defining an arbitrary byte swap operation. The weights of the operand bytes and the result are considered to grow from right to left from the value 0. Then, if the position i of the MOP constant contains the value OxXY, where OxXY is the content of the position j of the constant MID, the operation produces a result C 25 whose position byte i receives the byte at the position j of the second operand B.
[0006] Figure 2 illustrates this feature using an example. It is considered that the first operand A receives a constant MOP whose weight byte 2 contains the value 0) (20. The operand B receives eight bytes Bo to B7 The byte of value 0x20 of the constant MOP identifies the weight In these conditions, the weight byte 5, B5, of the operand B is placed at the weight 2 of the result C. According to this technique, several useful constants can be defined, for example: 0x01 02 04 08 10 20 40 80: inverting the byte order of the second operand B, 0x80 40 08 04 20 10 02 01: interleaving the 16-bit words of the two 32-bit words forming the operand B, 0x80 08 40 04 20 02 10 01: interleaving bytes of the two 32-bit words forming operand B, etc. Any reorganization of bytes within the same operand B is thus possible by appropriately constructing a MOP constant for operand A. however, there are situations where you want to rearrange data in a data sequence that is too large for matrices processed by the BMM unit. The data sequence can then be divided into several packets of the size of the matrices, and each packet can be processed in turn by the BMM unit. If data of two consecutive packets must be mixed in a single multiplication result C, it may be necessary to carry out several transfers between working registers to prepare the operands to be supplied to the BMM unit. FIG. 3 is a partial block diagram of a processor embodiment provided with a BMM unit allowing, with the aid of specific BMM instructions, to reduce the number of instruction cycles necessary to mix data. from several packets.
[0007] The processor includes a set of REGS work registers. The size of the registers is adapted to the processor. In the context of a 32-bit processor, the registers also have a size of 32 bits, corresponding to a simple precision. On the other hand, the BMM unit is designed to process double precision words (64 bits). The register set is associated with a control circuit CTRL which may be designed to simultaneously provide the contents of a register pair as a 64 bit multiplicand to the BMM unit. The control circuit can also be designed to write the result of multiplication, of double precision, in a pair of registers.
[0008] In a conventional processor architecture designed to use a pair of registers to manipulate dual precision data, the instructions identify only the first register of the pair. The second register of the pair is implicitly the next rank register in the register addressing system.
[0009] Thus, instructions handling double precision data can only identify even-rank registers, knowing that odd-rank registers are reserved to implicitly form pairs with the registers identified by the instructions. The architecture of Figure 3 is designed to allow instructions to explicitly identify any two registers to form a pair for containing a double precision word. In particular, a BMM instruction is provided for explicitly identifying each of the two registers that together contain an operand, and each of the two registers jointly receiving the result. More specifically, to carry out a multiplication of matrices of type C = MOPxB, provision may be made for a BMM instruction noted: bmm $ rx: $ ry, MOP, $ ri: $ rj Where "$ rx: $ ry" designates the addresses of two registers to be used to receive the result C, and "$ ri: $ rj" means the addresses of the two registers which contain the multiplicand B. The notations without the sign "$" will designate the contents of the registers. The notation "MOP" designates an immediate constant of double precision conveyed in the instruction, which is provided directly as multiplicand A of the unit BMM. It is assumed that the first register of the pairs contains the bytes of least weight, and the second register the bytes of the most significant. Thus, the MOP constant is also expressed in this format - for example, if the MOP constant is chosen equal to the MID identity matrix, it would be expressed as: MOP = 0x08040201: 0x80402010. An instruction register 30 is designed to provide in parallel the register addresses ($ ri, $ rj, $ rx, $ ry) conveyed by the instruction bmm to the control circuit CTRL, and the operand MOP as multiplicand A to the BMM unit. The control circuit is then designed to join the contents of the registers $ ri and $ rj to form the multiplicand B of the unit BMM, and to separate the result C from the multiplication in the two registers $ rx and $ ry (the bytes of low weight being associated with the first registers of pairs).
[0010] In some architectures, operand A of the bmm instruction may also be register type and, like operand B, identify a pair of explicit registers $ ru: $ rv. In fact, in many processor architectures, the immediate values conveyed in the instructions are used as the last parameter. It is then preferred to use an instruction of the type: sbmm $ rx: $ ry, $ ri: $ rj, MOP Where sbmm C, B, A Produces the same result as 10 bmm C, A, B BMM is not modified - the prefix "s", for "swapped", of the instruction sbmm simply means that the operands of the instruction are inverted by wiring relative to the multiplicands of the unit BMM. Figure 4 illustrates a first example of data reorganization consisting of interleaving 16-bit words of two packets of four 16-bit words. Such a type of reorganization may be useful in the case of processors SIMD type (English "Single Instruction-Multiple Data"). In a SIMD processor, a single instruction is provided simultaneously to a plurality of similar processing units that simultaneously process respective data "paths". In many situations, it is desired at certain stages of processing to route the data to different paths. The words of the first packet WA are designated by wao at wa7, and the words of the second packet WB by wbo at wb7. The interleaving consists of producing a sequence of eight 16-bit WC words where each word taken from the packet WA is followed by the word of the same weight of the packet WB, as shown. With the architecture of FIG. 3, such an operation can be performed using only two sbmm instructions. For example, assuming that the two word packets are initially contained in the registers $ r0 to $ r3: sbmm $ r10: $ r11, $ r0: $ r2, 0x20100201: 0x80400804 sbmm $ r12: $ r13, $ r1: $ r3, 0x20100201: 0x80400804 FIGS. 5A-5C illustrate in more detail the operation of these instructions. FIG. 5A shows the initial contents of the registers $ r0 to $ r3, in the form of 8 × 8-bit matrices in the format previously specified. The cells of the matrices simply contain the indices ij of the bits. The register pair $ r0: $ r1 contains the word packet WA, the indices being indicated in bold. The pair of registers $ r2: $ r3 contains the word packet WB, the indices being indicated in italics. Each word, 16 bits, occupies two consecutive rows of the corresponding matrix. The contents of the first registers of the pairs, associated with the words of low weight, are located in the upper half of the matrices.
[0011] Note that each of the operands B of the instructions sbmm above designate pairs of registers that do not correspond to one of the matrices of FIG. 5A. In fact, each operand B denotes a matrix constructed from one half of each of the two matrices of Figure 5A. Figure 5B shows the matrices used as operands B by the instructions sbmm. The fact of being able to freely select the two registers used to form the operand B makes it possible to carry out, without additional cost in instruction cycles, a preliminary reorganization, notably a reorganization involving the two packets WA and WB. More specifically, in FIG. 5B, operand B of the first instruction sbmm is formed from the first two words of the packet WA, taken from the register $ r0, and from the first two words of the packet WB, taken from the register $ r2. The operand B of the second instruction sbmm is formed of the last two words of the packet WA, taken from the register $ rl, and the last two words of the packet WB, taken from the register $ r3. On the right, matrices have been shown the constant MID identity in correspondence with the rows of matrices.
[0012] In FIG. 5C, the two sbmm instructions were executed and the results were written in register pairs $ r10: $ r11 and $ r12: $ r13, respectively. The reorganization was essentially to leave the first two rows and the last two rows intact and to exchange the last two rows of the first register with the first two rows of the second register.
[0013] The value of the MOP constant is indicated on the right in correspondence with the rows of the matrices. It can be seen that the digit pairs of the MID constant of FIG. 5B "followed" their respective rows in FIG. 5C to form the MOP constant.
[0014] In a conventional architecture using implicit register pairs to convey dual precision data, none of the matrices of FIG. 5B can be used directly as operand B of a BMM instruction. Indeed, the registers of the pairs do not have consecutive addresses. The contents of the Sri and $ r2 registers of Figure 5A should first be exchanged, which involves the execution of three register register write instructions through a temporary register. In other words, the same operation would require five instructions to process the two packets instead of two. To extend the capabilities of the processor, it is proposed to add an instruction written 10 sbmmt performing both a BMM operation and a transposition of the result matrix C. In other words, the execution of the instruction: sbmmt C , B, MOP Provides a result C which is the transposed matrix of the result C produced by the instruction sbmm C, B, MOP. Such an instruction sbmmt uses the same unit BMM as the instruction sbmm. The transposition is done simply by wiring the output of the BMM unit. The instruction can be used to perform a simple transposition when the operand MOP is equal to the identity matrix MID. Figure 6 illustrates an example of reorganization where the sbmmt instruction is useful. This is a so-called "bit-slicing" operation, performed as an example on a 16-byte sequence bo to b15 to produce eight 16-bit words w0 to w7. The operation consists in grouping in a word wi the bits of weight i of the 16 bytes, by increasing weight of the bytes. Such an operation can be performed using only two instructions sbmmt and two instructions sbmm. For example, assuming that the 16 bytes are initially contained in the increasing registers $ r0 to $ r3: sbmmt $ r0: $ r1, $ r0: $ r1, 0x08040201: 0x80402010 sbmmt $ r2: $ r3, $ r2 : $ r3, 0x08040201: 0x80402010 sbmm $ r10: $ r11, $ r0: $ r2, 0x20021001: 0x80084004 sbmm $ r12: $ r13, $ r1: $ r3, 0x20021001: 0x80084004 Figures 7A-7D illustrate in more detail the operation of these instructions in the same format as FIGS. 5A to 5C. The MOP constant of the first two sbmmt statements and the MID identity matrix. As a result, these instructions perform a simple transposition.
[0015] Figure 7A shows the initial contents of the registers $ r0 to $ r3, the register pair $ r0: $ r1 containing the bytes bo to b7 of the sequence, and the register pair $ r2: $ r3 containing the bytes b8 to b15 . Figure 7B illustrates the result of the instructions sbmmt, rewritten results in the registers $ r0 to $ r3. The matrices of Figure 7A have been transposed.
[0016] The sbmmt instructions thus classify the bits of the same weight of the bytes in the same row, and the bits of the same byte in the same column. The byte weights increase from right to left, and the bit weights increase from top to bottom. Fig. 7C shows the matrices used as operands B by the instructions sbmm. More specifically, operand B of the first instruction sbmm is formed by the contents of the pair of registers $ r0: $ r2, and operand B of the second instruction sbmm is formed by the contents of the pair of registers Sri : $ r3. On the right, matrices have been shown the constant MID identity in correspondence with the rows of matrices. In Figure 7D, the two sbmm instructions were executed and the results were written in register pairs $ r10: $ r1 1 and $ r12: $ r13, respectively. The value of the MOP constant is indicated on the right in correspondence with the rows of the matrices. It can be seen that the digit pairs of the MID constant of FIG. 7C "followed" their respective rows in FIG. 7D to form the MOP constant. This constant is the one indicated above, operating the interleaving of the bytes of the two 32-bit words of operand B. In a conventional architecture using implicit register pairs to convey double precision data, none of the matrices of the Figure 7C can not be used directly as operand B of a BMM instruction. Indeed, the registers of the pairs do not have consecutive addresses. The contents of the Sri and $ r2 registers of Figure 7B would first have to be exchanged, which involves the execution of three register register write instructions through a temporary register. In other words, the same operation would require seven instructions to process the two packets instead of four. In certain processor architectures dedicated to cryptography, there is provided a processing unit dedicated to "bit-slicing", whose surface may be greater than that of a BMM unit. The processor architecture described herein provides, using a single BMM unit, and a set of instructions that can explicitly identify the registers to be used for dual precision data, generic and flexible data reordering functions. which are not limited to particular technical fields.

权利要求:
Claims (5)
[0001]
REVENDICATIONS1. A processor comprising, in its instruction set, a bit matrix multiplication instruction (sbmm) having: - a first double precision operand (A) representing a first matrix to be multiplied, - a second operand (B) explicitly designating two any simple precision work registers whose joined contents represent a second matrix to be multiplied, and - a destination parameter (C) explicitly designating any two simple precision work registers to jointly hold a matrix representing the result of the multiplication.
[0002]
A processor according to claim 1, comprising: a set of simple precision work registers (REGS) adapted for reading, joining the contents of two individually selected registers to a double precision outgoing word and, in writing, to separate an incoming word of double precision in two individually selected registers; a bit matrix multiplication unit (BMM) arranged to receive two multiplicating matrices in the form of double precision words and write a result matrix in the form of a double precision word in the set of registers; an instruction processing unit (30) designed for executing a bit matrix multiplication instruction: - supplying the first operand directly as a first of the two multiplicands of the matrix multiplication unit of bits, - use the second operand to read in the register set the second multiplicand of the bit matrix multiplication unit, and - use the destination parameter to write in the set of registers the result provided by the unit of multiplication of bit matrices.
[0003]
The processor of claim 2, wherein the bit matrix multiplication unit is further adapted to respond to a variant (sbmmt) of the bit matrix multiplication instruction by providing a double precision result corresponding to the transposed matrix of the result of the multiplication.
[0004]
A method of multiplying bit matrices, comprising the steps of: - representing bit matrices by double precision words; - read two individually selected registers in a set of simple precision work registers; - Join the contents of the two read registers to form a first multiplicand matrix (B); multiplying the first multiplicand matrix by a second multiplicand matrix (A); - separate the result (C) of the multiplication into two words of simple precision; and - writing the two simple precision words in two individually selected registers of the register set.
[0005]
5. Method according to claim 4, comprising the following steps: defining the second multiplicand matrix (A) directly in a first operand of a bit matrix multiplication instruction; defining the registers for forming the first multiplicand matrix in a second operand of the bit matrix multiplication instruction; and - defining the registers for accommodating the result of the multiplication in a destination parameter of the bit matrix multiplication instruction.

类似技术:

公开号 | 公开日 | 专利标题

FR3021428A1|2015-11-27|MULTIPLICATION OF BIT MATRICES USING EXPLICIT REGISTERS

JP5408913B2|2014-02-05|Fast and efficient matrix multiplication hardware module

TWI263402B|2006-10-01|Reconfigurable fir filter

CN109086076A|2018-12-25|Processing with Neural Network device and its method for executing dot product instruction

CN108205700B|2021-07-30|Neural network operation device and method

US20120072704A1|2012-03-22|"or" bit matrix multiply vector instruction

FR2582424A1|1986-11-28|FAST CALCULATION CIRCUIT OF COSINUS TRANSFORMATION, DIRECT OR REVERSE, OF DISCREET SIGNAL

Gil et al.2019|Unity makes strength: a review on mutualistic symbiosis in representative insect clades

EP3671488A1|2020-06-24|System for multiplying matrices by blocks

WO2017137015A3|2017-10-05|Processor containing three-dimensional memory array

US20140219577A1|2014-08-07|Symmetric filter arithmetic apparatus and symmetric filter arithmetic method

Chan et al.2014|Inequalities for ranks of partitions and the first moment of ranks and cranks of partitions

Boltaboyev2020|HISTORY OF RELIGIOUS CONFESSIONAL POLITICS IN THE SOVIET PERIOD.

Koyama et al.2021|A secure three-input AND protocol with a standard deck of minimal cards

US20200120154A1|2020-04-16|Parallel Computing System

JP6102645B2|2017-03-29|Product-sum operation circuit and product-sum operation system

RU2708793C1|2019-12-11|Modulo three adder

US20150154005A1|2015-06-04|Methods and Apparatuses for Performing Multiplication

CN108040257A|2018-05-15|A kind of two-dimensional dct Hardware Implementation and device

JP2008052504A|2008-03-06|Discrete fourier transform device and discrete fourier inverse transform device

Gessel et al.2014|An empirical method for solving | algebraic functional equations of the form f |, p |, x, t)= 0

Kassambara2017|Network Analysis and Visualization in R: Quick Start Guide

EP2162829A1|2010-03-17|Addressing device for parallel processor

Wang et al.2018|On the sum of Laplacian eigenvalues of a signed graph

Dainyak et al.2015|On an extremal inverse problem in graph theory

同族专利:

公开号 | 公开日

FR3021428B1|2017-10-13|

CN105117372A|2015-12-02|

US9898251B2|2018-02-20|

CN105117372B|2020-04-14|

EP2947562A1|2015-11-25|

US20150339101A1|2015-11-26|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US20020010847A1|1998-03-31|2002-01-24|Mohammad Abdallah|Executing partial-width packed data instructions|

WO2000068783A2|1999-05-12|2000-11-16|Analog Devices, Inc.|Digital signal processor computation core|

US20040255100A1|2003-06-16|2004-12-16|Arm Limited|Result partitioning within SIMD data processing systems|

CN1180427A|1996-02-28|1998-04-29|爱特梅尔股份有限公司|System for performing arithmetic operations with single or double precision|

US7237097B2|2001-02-21|2007-06-26|Mips Technologies, Inc.|Partial bitwise permutations|

US8954484B2|2009-06-12|2015-02-10|Cray Inc.|Inclusive or bit matrix to compare multiple corresponding subfields|

US9760371B2|2011-12-22|2017-09-12|Intel Corporation|Packed data operation mask register arithmetic combination processors, methods, systems, and instructions|

US9171593B2|2011-12-31|2015-10-27|Institute Of Automation, Chinese Academy Of Sciences|Multi-granularity parallel storage system|

US20160179523A1|2014-12-23|2016-06-23|Intel Corporation|Apparatus and method for vector broadcast and xorand logical instruction|CN107315574B|2016-04-26|2021-01-01|安徽寒武纪信息科技有限公司|Apparatus and method for performing matrix multiplication operation|

US10795677B2|2017-09-29|2020-10-06|Intel Corporation|Systems, apparatuses, and methods for multiplication, negation, and accumulation of vector packed signed values|

US10534838B2|2017-09-29|2020-01-14|Intel Corporation|Bit matrix multiplication|

US10795676B2|2017-09-29|2020-10-06|Intel Corporation|Apparatus and method for multiplication and accumulation of complex and real packed data elements|

US10514924B2|2017-09-29|2019-12-24|Intel Corporation|Apparatus and method for performing dual signed and unsigned multiplication of packed data elements|

US10802826B2|2017-09-29|2020-10-13|Intel Corporation|Apparatus and method for performing dual signed and unsigned multiplication of packed data elements|

US10664277B2|2017-09-29|2020-05-26|Intel Corporation|Systems, apparatuses and methods for dual complex by complex conjugate multiply of signed words|

US11243765B2|2017-09-29|2022-02-08|Intel Corporation|Apparatus and method for scaling pre-scaled results of complex multiply-accumulate operations on packed real and imaginary data elements|

US11074073B2|2017-09-29|2021-07-27|Intel Corporation|Apparatus and method for multiply, add/subtract, and accumulate of packed data elements|

US11256504B2|2017-09-29|2022-02-22|Intel Corporation|Apparatus and method for complex by complex conjugate multiplication|

US10552154B2|2017-09-29|2020-02-04|Intel Corporation|Apparatus and method for multiplication and accumulation of complex and real packed data elements|

CN108108189B|2017-12-15|2020-10-30|安徽寒武纪信息科技有限公司|Calculation method and related product|

FR3088767B1|2018-11-16|2022-03-04|Commissariat Energie Atomique|MEMORY CIRCUIT SUITABLE FOR IMPLEMENTING CALCULATION OPERATIONS|

法律状态:
2015-05-21| PLFP| Fee payment|Year of fee payment: 2 |

2015-11-27| PLSC| Search report ready|Effective date: 20151127 |

2016-05-18| PLFP| Fee payment|Year of fee payment: 3 |

2017-05-18| PLFP| Fee payment|Year of fee payment: 4 |

2018-05-22| PLFP| Fee payment|Year of fee payment: 5 |

2019-05-21| PLFP| Fee payment|Year of fee payment: 6 |

2020-05-25| PLFP| Fee payment|Year of fee payment: 7 |

2021-05-18| PLFP| Fee payment|Year of fee payment: 8 |

优先权:

申请号 | 申请日 | 专利标题

FR1454683A|FR3021428B1|2014-05-23|2014-05-23|MULTIPLICATION OF BIT MATRICES USING EXPLICIT REGISTERS|FR1454683A| FR3021428B1|2014-05-23|2014-05-23|MULTIPLICATION OF BIT MATRICES USING EXPLICIT REGISTERS|

EP15167989.1A| EP2947562A1|2014-05-23|2015-05-18|Bit-matrix multiplication using explicit register|

US14/716,234| US9898251B2|2014-05-23|2015-05-19|Bit-matrix multiplication using explicit register|

CN201510409813.0A| CN105117372B|2014-05-23|2015-05-22|Bit matrix multiplication using explicit registers|

[返回顶部]